Delay-Resistant Geo-Distributed Analytics

نویسندگان

چکیده

Big data analytics platforms have played a critical role in the unprecedented success of data-driven applications. However, real-time and streaming applications, recent legislation, e.g., GDPR Europe, posed constraints on exchanging analyzing data, especially personal across geographic regions. To address such has to be processed analyzed in-situ aggregated results exchanged among different sites for further processing. This introduces additional network delays due distribution potentially affecting performance that are designed operate datacenters with low delays. In this paper, we show three most popular big systems (Apache Storm, Apache Spark, Flink) fail tolerate round-trip times more than 30 milliseconds even when input rate is low. The execution time distributed tasks degrades substantially after threshold, some sensitive others. A closer examination understanding design these there no winner all wide-area settings. it possible improve significantly amid transcontinental (where inter-node delay milliseconds) achieve comparable within datacenter same load.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bohr: Similarity Aware Geo-distributed Data Analytics

We propose Bohr, a similarity aware geo-distributed data analytics system that minimizes query completion time. The key idea is to exploit similarity between data in different data centers (DCs), and transfer similar data from the bottleneck DC to other sites with more WAN bandwidth. Though these sites have more input data to process, these data are more similar and can be more efficiently aggr...

متن کامل

Low Latency Geo-distributed Data Analytics – Public Review

Large cloud service providers ingest massive amounts of data in geographically distributed sites spread across the globe. Analytics for such planetary-scale datasets is an important emerging challenge. The current practice is to copy all data to a central location, where it can be dealt with locally by standard data analytics stacks such as Hadoop and Spark. However, transferring large volumes ...

متن کامل

WANalytics: Analytics for a Geo-Distributed Data-Intensive World

Large organizations today operate data centers around the globe where massive amounts of data are produced and consumed by local users. Despite their geographically diverse origin, such data must be analyzed/mined as a whole. We call the problem of supporting rich DAGs of computation across geographically distributed data Wide-Area Big-Data (WABD). To the best of our knowledge, WABD is not supp...

متن کامل

Geo-social visual analytics

Spatial analysis and social network analysis typically consider social processes in their own specific contexts, either geographical or network space. Both approaches demonstrate strong conceptual overlaps. For example, actors close to each other tend to have greater similarity than those far apart; this phenomenon has different labels in geography (spatial autocorrelation) and in network scien...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Network and Service Management

سال: 2022

ISSN: ['2373-7379', '1932-4537']

DOI: https://doi.org/10.1109/tnsm.2022.3192710